Overview

Dataset statistics

Number of variables19
Number of observations4724
Missing cells4289
Missing cells (%)4.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory701.3 KiB
Average record size in memory152.0 B

Variable types

Numeric10
Text4
Categorical5

Alerts

Category is highly overall correlated with SourceHigh correlation
Google WC is highly overall correlated with Joon WC v1High correlation
Joon WC v1 is highly overall correlated with Google WCHigh correlation
Sentence Count v1 is highly overall correlated with Sentence Count v2High correlation
Sentence Count v2 is highly overall correlated with Sentence Count v1High correlation
Source is highly overall correlated with CategoryHigh correlation
Source is highly imbalanced (51.3%)Imbalance
MPAA Max is highly imbalanced (57.6%)Imbalance
British Words has 4280 (90.6%) missing valuesMissing
ID has unique valuesUnique
Excerpt has unique valuesUnique
BT Easiness has unique valuesUnique
BT s.e. has unique valuesUnique
British WC has 4280 (90.6%) zerosZeros

Reproduction

Analysis started2023-12-04 14:27:23.570362
Analysis finished2023-12-04 14:27:47.719344
Duration24.15 seconds
Software versionydata-profiling vv4.6.2
Download configurationconfig.json

Variables

ID
Real number (ℝ)

UNIQUE 

Distinct4724
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4366.3472
Minimum400
Maximum8031
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size37.0 KiB
2023-12-04T15:27:47.972013image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum400
5-th percentile1215.15
Q12769.75
median4483.5
Q35939.25
95-th percentile7239.7
Maximum8031
Range7631
Interquartile range (IQR)3169.5

Descriptive statistics

Standard deviation1896.3637
Coefficient of variation (CV)0.43431354
Kurtosis-1.0449349
Mean4366.3472
Median Absolute Deviation (MAD)1606.5
Skewness-0.092433569
Sum20626624
Variance3596195.3
MonotonicityStrictly increasing
2023-12-04T15:27:48.236943image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
400 1
 
< 0.1%
5460 1
 
< 0.1%
5477 1
 
< 0.1%
5476 1
 
< 0.1%
5475 1
 
< 0.1%
5474 1
 
< 0.1%
5473 1
 
< 0.1%
5472 1
 
< 0.1%
5471 1
 
< 0.1%
5470 1
 
< 0.1%
Other values (4714) 4714
99.8%
ValueCountFrequency (%)
400 1
< 0.1%
401 1
< 0.1%
402 1
< 0.1%
403 1
< 0.1%
404 1
< 0.1%
405 1
< 0.1%
406 1
< 0.1%
407 1
< 0.1%
408 1
< 0.1%
409 1
< 0.1%
ValueCountFrequency (%)
8031 1
< 0.1%
8030 1
< 0.1%
8029 1
< 0.1%
8028 1
< 0.1%
8027 1
< 0.1%
8026 1
< 0.1%
8025 1
< 0.1%
8024 1
< 0.1%
8023 1
< 0.1%
8022 1
< 0.1%

Author
Text

Distinct2409
Distinct (%)51.0%
Missing0
Missing (%)0.0%
Memory size37.0 KiB
2023-12-04T15:27:48.672423image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

Max length268
Median length167
Mean length18.862828
Min length1

Characters and Unicode

Total characters89108
Distinct characters100
Distinct categories11 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1947 ?
Unique (%)41.2%

Sample

1st rowCarolyn Wells
2nd rowCarolyn Wells
3rd rowCarolyn Wells
4th rowCHARLES KINGSLEY
5th rowCharles Kingsley
ValueCountFrequency (%)
448
 
3.1%
wiki 276
 
1.9%
simple 275
 
1.9%
wikipedia 274
 
1.9%
a 191
 
1.3%
m 168
 
1.2%
and 148
 
1.0%
h 141
 
1.0%
by 141
 
1.0%
e 141
 
1.0%
Other values (4315) 12282
84.8%
2023-12-04T15:27:49.418797image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
9631
 
10.8%
e 6928
 
7.8%
a 6484
 
7.3%
i 5786
 
6.5%
r 4787
 
5.4%
n 4430
 
5.0%
o 3827
 
4.3%
l 3530
 
4.0%
s 3011
 
3.4%
t 2996
 
3.4%
Other values (90) 37698
42.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 58083
65.2%
Uppercase Letter 16926
 
19.0%
Space Separator 9631
 
10.8%
Other Punctuation 3592
 
4.0%
Control 387
 
0.4%
Decimal Number 256
 
0.3%
Dash Punctuation 139
 
0.2%
Close Punctuation 31
 
< 0.1%
Open Punctuation 31
 
< 0.1%
Math Symbol 24
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 6928
11.9%
a 6484
11.2%
i 5786
10.0%
r 4787
 
8.2%
n 4430
 
7.6%
o 3827
 
6.6%
l 3530
 
6.1%
s 3011
 
5.2%
t 2996
 
5.2%
d 2077
 
3.6%
Other values (34) 14227
24.5%
Uppercase Letter
ValueCountFrequency (%)
A 1453
 
8.6%
M 1268
 
7.5%
S 1246
 
7.4%
E 1197
 
7.1%
C 962
 
5.7%
H 938
 
5.5%
R 934
 
5.5%
L 918
 
5.4%
B 823
 
4.9%
T 743
 
4.4%
Other values (19) 6444
38.1%
Decimal Number
ValueCountFrequency (%)
1 102
39.8%
2 38
 
14.8%
9 34
 
13.3%
4 33
 
12.9%
5 11
 
4.3%
3 9
 
3.5%
6 8
 
3.1%
8 7
 
2.7%
7 7
 
2.7%
0 7
 
2.7%
Other Punctuation
ValueCountFrequency (%)
. 2299
64.0%
, 715
 
19.9%
? 299
 
8.3%
& 175
 
4.9%
; 48
 
1.3%
' 34
 
0.9%
" 18
 
0.5%
: 4
 
0.1%
Math Symbol
ValueCountFrequency (%)
> 8
33.3%
+ 8
33.3%
< 8
33.3%
Space Separator
ValueCountFrequency (%)
9631
100.0%
Control
ValueCountFrequency (%)
387
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 139
100.0%
Close Punctuation
ValueCountFrequency (%)
) 31
100.0%
Open Punctuation
ValueCountFrequency (%)
( 31
100.0%
Final Punctuation
ValueCountFrequency (%)
8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 75009
84.2%
Common 14099
 
15.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 6928
 
9.2%
a 6484
 
8.6%
i 5786
 
7.7%
r 4787
 
6.4%
n 4430
 
5.9%
o 3827
 
5.1%
l 3530
 
4.7%
s 3011
 
4.0%
t 2996
 
4.0%
d 2077
 
2.8%
Other values (63) 31153
41.5%
Common
ValueCountFrequency (%)
9631
68.3%
. 2299
 
16.3%
, 715
 
5.1%
387
 
2.7%
? 299
 
2.1%
& 175
 
1.2%
- 139
 
1.0%
1 102
 
0.7%
; 48
 
0.3%
2 38
 
0.3%
Other values (17) 266
 
1.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 88946
99.8%
None 154
 
0.2%
Punctuation 8
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
9631
 
10.8%
e 6928
 
7.8%
a 6484
 
7.3%
i 5786
 
6.5%
r 4787
 
5.4%
n 4430
 
5.0%
o 3827
 
4.3%
l 3530
 
4.0%
s 3011
 
3.4%
t 2996
 
3.4%
Other values (68) 37536
42.2%
None
ValueCountFrequency (%)
é 55
35.7%
í 15
 
9.7%
á 13
 
8.4%
ö 12
 
7.8%
ä 8
 
5.2%
ñ 8
 
5.2%
ó 7
 
4.5%
ü 6
 
3.9%
è 5
 
3.2%
ú 4
 
2.6%
Other values (11) 21
 
13.6%
Punctuation
ValueCountFrequency (%)
8
100.0%

Title
Text

Distinct4658
Distinct (%)98.6%
Missing0
Missing (%)0.0%
Memory size37.0 KiB
2023-12-04T15:27:49.742802image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

Max length189
Median length107
Mean length27.504869
Min length1

Characters and Unicode

Total characters129933
Distinct characters101
Distinct categories13 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4612 ?
Unique (%)97.6%

Sample

1st rowPatty's Suitors
2nd rowTwo Little Women on a Holiday
3rd rowPatty Blossom
4th rowTHE WATER-BABIES A Fairy Tale for a Land-Baby
5th rowHOW THE ARGONAUTS WERE DRIVEN INTO THE UNKNOWN SEA
ValueCountFrequency (%)
the 2483
 
11.4%
of 1006
 
4.6%
and 653
 
3.0%
a 549
 
2.5%
in 332
 
1.5%
to 243
 
1.1%
how 195
 
0.9%
on 159
 
0.7%
story 146
 
0.7%
for 129
 
0.6%
Other values (6670) 15920
73.0%
2023-12-04T15:27:50.425906image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
17392
 
13.4%
e 9274
 
7.1%
o 6148
 
4.7%
a 5781
 
4.4%
n 5463
 
4.2%
t 5350
 
4.1%
i 5315
 
4.1%
r 5300
 
4.1%
T 4179
 
3.2%
s 4155
 
3.2%
Other values (91) 61576
47.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 68795
52.9%
Uppercase Letter 40432
31.1%
Space Separator 17392
 
13.4%
Other Punctuation 1836
 
1.4%
Dash Punctuation 359
 
0.3%
Control 353
 
0.3%
Connector Punctuation 309
 
0.2%
Decimal Number 220
 
0.2%
Final Punctuation 114
 
0.1%
Close Punctuation 42
 
< 0.1%
Other values (3) 81
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 9274
13.5%
o 6148
 
8.9%
a 5781
 
8.4%
n 5463
 
7.9%
t 5350
 
7.8%
i 5315
 
7.7%
r 5300
 
7.7%
s 4155
 
6.0%
h 3552
 
5.2%
l 3040
 
4.4%
Other values (21) 15417
22.4%
Uppercase Letter
ValueCountFrequency (%)
T 4179
 
10.3%
E 3695
 
9.1%
A 3233
 
8.0%
S 2887
 
7.1%
I 2356
 
5.8%
O 2349
 
5.8%
R 2342
 
5.8%
H 2240
 
5.5%
N 2163
 
5.3%
C 1729
 
4.3%
Other values (20) 13259
32.8%
Other Punctuation
ValueCountFrequency (%)
' 467
25.4%
. 364
19.8%
, 295
16.1%
: 275
15.0%
? 236
12.9%
" 90
 
4.9%
! 63
 
3.4%
; 25
 
1.4%
& 8
 
0.4%
% 5
 
0.3%
Other values (2) 8
 
0.4%
Decimal Number
ValueCountFrequency (%)
1 65
29.5%
2 32
14.5%
0 24
 
10.9%
3 20
 
9.1%
9 20
 
9.1%
4 17
 
7.7%
8 15
 
6.8%
7 13
 
5.9%
6 7
 
3.2%
5 7
 
3.2%
Math Symbol
ValueCountFrequency (%)
+ 3
37.5%
> 2
25.0%
< 2
25.0%
× 1
 
12.5%
Dash Punctuation
ValueCountFrequency (%)
- 313
87.2%
40
 
11.1%
6
 
1.7%
Final Punctuation
ValueCountFrequency (%)
84
73.7%
30
 
26.3%
Close Punctuation
ValueCountFrequency (%)
) 39
92.9%
] 3
 
7.1%
Open Punctuation
ValueCountFrequency (%)
( 39
92.9%
[ 3
 
7.1%
Initial Punctuation
ValueCountFrequency (%)
30
96.8%
1
 
3.2%
Space Separator
ValueCountFrequency (%)
17392
100.0%
Control
ValueCountFrequency (%)
353
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 309
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 109227
84.1%
Common 20706
 
15.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 9274
 
8.5%
o 6148
 
5.6%
a 5781
 
5.3%
n 5463
 
5.0%
t 5350
 
4.9%
i 5315
 
4.9%
r 5300
 
4.9%
T 4179
 
3.8%
s 4155
 
3.8%
E 3695
 
3.4%
Other values (51) 54567
50.0%
Common
ValueCountFrequency (%)
17392
84.0%
' 467
 
2.3%
. 364
 
1.8%
353
 
1.7%
- 313
 
1.5%
_ 309
 
1.5%
, 295
 
1.4%
: 275
 
1.3%
? 236
 
1.1%
" 90
 
0.4%
Other values (30) 612
 
3.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 129717
99.8%
Punctuation 194
 
0.1%
None 22
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
17392
 
13.4%
e 9274
 
7.1%
o 6148
 
4.7%
a 5781
 
4.5%
n 5463
 
4.2%
t 5350
 
4.1%
i 5315
 
4.1%
r 5300
 
4.1%
T 4179
 
3.2%
s 4155
 
3.2%
Other values (74) 61360
47.3%
Punctuation
ValueCountFrequency (%)
84
43.3%
40
20.6%
30
 
15.5%
30
 
15.5%
6
 
3.1%
3
 
1.5%
1
 
0.5%
None
ValueCountFrequency (%)
Æ 5
22.7%
æ 5
22.7%
é 4
18.2%
Ö 2
 
9.1%
Î 1
 
4.5%
ö 1
 
4.5%
× 1
 
4.5%
É 1
 
4.5%
ü 1
 
4.5%
à 1
 
4.5%

Source
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct19
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size37.0 KiB
gutenberg
2916 
kids.frontiersin
458 
commonlit
296 
simple.wikipedia
 
275
wikipedia
 
274
Other values (14)
505 

Length

Max length18
Median length9
Mean length10.768628
Min length4

Characters and Unicode

Total characters50871
Distinct characters28
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)0.1%

Sample

1st rowgutenberg
2nd rowgutenberg
3rd rowgutenberg
4th rowgutenberg
5th rowgutenberg

Common Values

ValueCountFrequency (%)
gutenberg 2916
61.7%
kids.frontiersin 458
 
9.7%
commonlit 296
 
6.3%
simple.wikipedia 275
 
5.8%
wikipedia 274
 
5.8%
africanstorybook 250
 
5.3%
online-literature 95
 
2.0%
digitallibrary 61
 
1.3%
freekidsbooks 50
 
1.1%
static.ehe.osu.edu 16
 
0.3%
Other values (9) 33
 
0.7%

Length

2023-12-04T15:27:50.692701image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
gutenberg 2916
61.7%
kids.frontiersin 458
 
9.7%
commonlit 296
 
6.3%
simple.wikipedia 275
 
5.8%
wikipedia 274
 
5.8%
africanstorybook 250
 
5.3%
online-literature 95
 
2.0%
digitallibrary 61
 
1.3%
freekidsbooks 50
 
1.1%
static.ehe.osu.edu 16
 
0.3%
Other values (9) 33
 
0.7%

Most occurring characters

ValueCountFrequency (%)
e 7573
14.9%
g 5906
11.6%
r 4703
9.2%
n 4583
9.0%
i 4318
8.5%
t 4204
8.3%
b 3294
 
6.5%
u 3057
 
6.0%
o 2067
 
4.1%
s 1590
 
3.1%
Other values (18) 9576
18.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 49963
98.2%
Other Punctuation 805
 
1.6%
Dash Punctuation 95
 
0.2%
Decimal Number 8
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 7573
15.2%
g 5906
11.8%
r 4703
9.4%
n 4583
9.2%
i 4318
8.6%
t 4204
8.4%
b 3294
 
6.6%
u 3057
 
6.1%
o 2067
 
4.1%
s 1590
 
3.2%
Other values (14) 8668
17.3%
Decimal Number
ValueCountFrequency (%)
1 4
50.0%
2 4
50.0%
Other Punctuation
ValueCountFrequency (%)
. 805
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 95
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 49963
98.2%
Common 908
 
1.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 7573
15.2%
g 5906
11.8%
r 4703
9.4%
n 4583
9.2%
i 4318
8.6%
t 4204
8.4%
b 3294
 
6.6%
u 3057
 
6.1%
o 2067
 
4.1%
s 1590
 
3.2%
Other values (14) 8668
17.3%
Common
ValueCountFrequency (%)
. 805
88.7%
- 95
 
10.5%
1 4
 
0.4%
2 4
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 50871
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 7573
14.9%
g 5906
11.6%
r 4703
9.2%
n 4583
9.0%
i 4318
8.5%
t 4204
8.3%
b 3294
 
6.5%
u 3057
 
6.0%
o 2067
 
4.1%
s 1590
 
3.1%
Other values (18) 9576
18.8%

Pub Year
Real number (ℝ)

Distinct168
Distinct (%)3.6%
Missing9
Missing (%)0.2%
Infinite0
Infinite (%)0.0%
Mean1937.887
Minimum1728
Maximum2020
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size37.0 KiB
2023-12-04T15:27:50.940598image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum1728
5-th percentile1867
Q11884
median1915
Q32016
95-th percentile2020
Maximum2020
Range292
Interquartile range (IQR)132

Descriptive statistics

Standard deviation60.506795
Coefficient of variation (CV)0.031223078
Kurtosis-1.4246343
Mean1937.887
Median Absolute Deviation (MAD)35
Skewness0.35347988
Sum9137137
Variance3661.0723
MonotonicityNot monotonic
2023-12-04T15:27:51.194096image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2020 592
 
12.5%
2019 241
 
5.1%
2017 167
 
3.5%
1915 160
 
3.4%
1881 153
 
3.2%
2018 151
 
3.2%
1883 148
 
3.1%
1882 138
 
2.9%
1922 128
 
2.7%
1914 120
 
2.5%
Other values (158) 2717
57.5%
ValueCountFrequency (%)
1728 1
 
< 0.1%
1761 1
 
< 0.1%
1781 1
 
< 0.1%
1789 1
 
< 0.1%
1791 3
0.1%
1792 1
 
< 0.1%
1811 1
 
< 0.1%
1812 1
 
< 0.1%
1813 2
< 0.1%
1814 1
 
< 0.1%
ValueCountFrequency (%)
2020 592
12.5%
2019 241
5.1%
2018 151
 
3.2%
2017 167
 
3.5%
2016 111
 
2.3%
2015 95
 
2.0%
2014 112
 
2.4%
2013 39
 
0.8%
2012 10
 
0.2%
2011 7
 
0.1%

Category
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size37.0 KiB
Lit
2420 
Info
2304 

Length

Max length4
Median length3
Mean length3.4877223
Min length3

Characters and Unicode

Total characters16476
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLit
2nd rowLit
3rd rowLit
4th rowLit
5th rowLit

Common Values

ValueCountFrequency (%)
Lit 2420
51.2%
Info 2304
48.8%

Length

2023-12-04T15:27:51.439993image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-04T15:27:51.607449image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
lit 2420
51.2%
info 2304
48.8%

Most occurring characters

ValueCountFrequency (%)
L 2420
14.7%
i 2420
14.7%
t 2420
14.7%
I 2304
14.0%
n 2304
14.0%
f 2304
14.0%
o 2304
14.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 11752
71.3%
Uppercase Letter 4724
28.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 2420
20.6%
t 2420
20.6%
n 2304
19.6%
f 2304
19.6%
o 2304
19.6%
Uppercase Letter
ValueCountFrequency (%)
L 2420
51.2%
I 2304
48.8%

Most occurring scripts

ValueCountFrequency (%)
Latin 16476
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
L 2420
14.7%
i 2420
14.7%
t 2420
14.7%
I 2304
14.0%
n 2304
14.0%
f 2304
14.0%
o 2304
14.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 16476
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
L 2420
14.7%
i 2420
14.7%
t 2420
14.7%
I 2304
14.0%
n 2304
14.0%
f 2304
14.0%
o 2304
14.0%

Location
Categorical

Distinct4
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size37.0 KiB
mid
3470 
start
1024 
whole
 
122
end
 
108

Length

Max length5
Median length3
Mean length3.485182
Min length3

Characters and Unicode

Total characters16464
Distinct characters13
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowmid
2nd rowmid
3rd rowmid
4th rowmid
5th rowmid

Common Values

ValueCountFrequency (%)
mid 3470
73.5%
start 1024
 
21.7%
whole 122
 
2.6%
end 108
 
2.3%

Length

2023-12-04T15:27:51.808754image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-04T15:27:52.037716image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
mid 3470
73.5%
start 1024
 
21.7%
whole 122
 
2.6%
end 108
 
2.3%

Most occurring characters

ValueCountFrequency (%)
d 3578
21.7%
m 3470
21.1%
i 3470
21.1%
t 2048
12.4%
s 1024
 
6.2%
a 1024
 
6.2%
r 1024
 
6.2%
e 230
 
1.4%
w 122
 
0.7%
h 122
 
0.7%
Other values (3) 352
 
2.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 16464
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
d 3578
21.7%
m 3470
21.1%
i 3470
21.1%
t 2048
12.4%
s 1024
 
6.2%
a 1024
 
6.2%
r 1024
 
6.2%
e 230
 
1.4%
w 122
 
0.7%
h 122
 
0.7%
Other values (3) 352
 
2.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 16464
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
d 3578
21.7%
m 3470
21.1%
i 3470
21.1%
t 2048
12.4%
s 1024
 
6.2%
a 1024
 
6.2%
r 1024
 
6.2%
e 230
 
1.4%
w 122
 
0.7%
h 122
 
0.7%
Other values (3) 352
 
2.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 16464
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
d 3578
21.7%
m 3470
21.1%
i 3470
21.1%
t 2048
12.4%
s 1024
 
6.2%
a 1024
 
6.2%
r 1024
 
6.2%
e 230
 
1.4%
w 122
 
0.7%
h 122
 
0.7%
Other values (3) 352
 
2.1%

MPAA Max
Categorical

IMBALANCE 

Distinct4
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size37.0 KiB
G
3706 
PG
928 
PG-13
 
87
R
 
3

Length

Max length5
Median length1
Mean length1.2701101
Min length1

Characters and Unicode

Total characters6000
Distinct characters6
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowG
2nd rowPG
3rd rowPG
4th rowPG-13
5th rowPG

Common Values

ValueCountFrequency (%)
G 3706
78.5%
PG 928
 
19.6%
PG-13 87
 
1.8%
R 3
 
0.1%

Length

2023-12-04T15:27:52.255138image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-04T15:27:52.450033image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
g 3706
78.5%
pg 928
 
19.6%
pg-13 87
 
1.8%
r 3
 
0.1%

Most occurring characters

ValueCountFrequency (%)
G 4721
78.7%
P 1015
 
16.9%
- 87
 
1.5%
1 87
 
1.5%
3 87
 
1.5%
R 3
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 5739
95.7%
Decimal Number 174
 
2.9%
Dash Punctuation 87
 
1.5%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
G 4721
82.3%
P 1015
 
17.7%
R 3
 
0.1%
Decimal Number
ValueCountFrequency (%)
1 87
50.0%
3 87
50.0%
Dash Punctuation
ValueCountFrequency (%)
- 87
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5739
95.7%
Common 261
 
4.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
G 4721
82.3%
P 1015
 
17.7%
R 3
 
0.1%
Common
ValueCountFrequency (%)
- 87
33.3%
1 87
33.3%
3 87
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
G 4721
78.7%
P 1015
 
16.9%
- 87
 
1.5%
1 87
 
1.5%
3 87
 
1.5%
R 3
 
< 0.1%

Excerpt
Text

UNIQUE 

Distinct4724
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size37.0 KiB
2023-12-04T15:27:52.855949image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

Max length1341
Median length1111
Mean length972.80356
Min length667

Characters and Unicode

Total characters4595524
Distinct characters133
Distinct categories18 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4724 ?
Unique (%)100.0%

Sample

1st rowWhen the young people returned to the ballroom, it presented a decidedly changed appearance. Instead of an interior scene, it was a winter landscape. The floor was covered with snow-white canvas, not laid on smoothly, but rumpled over bumps and hillocks, like a real snow field. The numerous palms and evergreens that had decorated the room, were powdered with flour and strewn with tufts of cotton, like snow. Also diamond dust had been lightly sprinkled on them, and glittering crystal icicles hung from the branches. At each end of the room, on the wall, hung a beautiful bear-skin rug. These rugs were for prizes, one for the girls and one for the boys. And this was the game. The girls were gathered at one end of the room and the boys at the other, and one end was called the North Pole, and the other the South Pole. Each player was given a small flag which they were to plant on reaching the Pole. This would have been an easy matter, but each traveller was obliged to wear snowshoes.
2nd rowAll through dinner time, Mrs. Fayre was somewhat silent, her eyes resting on Dolly with a wistful, uncertain expression. She wanted to give the child the pleasure she craved, but she had hard work to bring herself to the point of overcoming her own objections. At last, however, when the meal was nearly over, she smiled at her little daughter, and said, "All right, Dolly, you may go." "Oh, mother!" Dolly cried, overwhelmed with sudden delight. "Really? Oh, I am so glad! Are you sure you're willing?" "I've persuaded myself to be willing, against my will," returned Mrs. Fayre, whimsically. "I confess I just hate to have you go, but I can't bear to deprive you of the pleasure trip. And, as you say, it would also keep Dotty at home, and so, altogether, I think I shall have to give in." "Oh, you angel mother! You blessed lady! How good you are!" And Dolly flew around the table and gave her mother a hug that nearly suffocated her.
3rd rowAs Roger had predicted, the snow departed as quickly as it came, and two days after their sleigh ride there was scarcely a vestige of white on the ground. Tennis was again possible and a great game was in progress on the court at Pine Laurel. Patty and Roger were playing against Elise and Sam Blaney, and the pairs were well matched. But the long-contested victory finally went against Patty, and she laughingly accepted defeat. "Only because Patty's not quite back on her game yet," Roger defended; "this child has been on the sick list, you know, Sam, and she isn't up to her own mark." "Well, I like that!" cried Patty; "suppose you bear half the blame, Roger. You see, Mr. Blaney, he is so absorbed in his own Love Game, he can't play with his old-time skill." "All right, Patsy, let it go at that. And it's so, too. I suddenly remembered something Mona told me to tell you, and it affected my service."
4th rowMr. Grimes was to come up next morning to Sir John Harthover's, at the Place, for his old chimney-sweep was gone to prison, and the chimneys wanted sweeping. And so he rode away, not giving Tom time to ask what the sweep had gone to prison for, which was a matter of interest to Tom, as he had been in prison once or twice himself. Moreover, the groom looked so very neat and clean, with his drab gaiters, drab breeches, drab jacket, snow-white tie with a smart pin in it, and clean round ruddy face, that Tom was offended and disgusted at his appearance, and considered him a stuck-up fellow, who gave himself airs because he wore smart clothes, and other people paid for them; and went behind the wall to fetch the half-brick after all; but did not, remembering that he had come in the way of business, and was, as it were, under a flag of truce.
5th rowAnd outside before the palace a great garden was walled round, filled full of stately fruit-trees, gray olives and sweet figs, and pomegranates, pears, and apples, which bore the whole year round. For the rich south-west wind fed them, till pear grew ripe on pear, fig on fig, and grape on grape, all the winter and the spring. And at the farther end gay flower-beds bloomed through all seasons of the year; and two fair fountains rose, and ran, one through the garden grounds, and one beneath the palace gate, to water all the town. Such noble gifts the heavens had given to Alcinous the wise. So they went in, and saw him sitting, like Poseidon, on his throne, with his golden sceptre by him, in garments stiff with gold, and in his hand a sculptured goblet, as he pledged the merchant kings; and beside him stood Arete, his wise and lovely queen, and leaned against a pillar as she spun her golden threads.
ValueCountFrequency (%)
the 56066
 
6.8%
and 28015
 
3.4%
of 25157
 
3.1%
to 21457
 
2.6%
a 20137
 
2.5%
in 15331
 
1.9%
was 9619
 
1.2%
that 8857
 
1.1%
is 8825
 
1.1%
it 8437
 
1.0%
Other values (40071) 616942
75.3%
2023-12-04T15:27:54.078348image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
807675
17.6%
e 457138
 
9.9%
t 324208
 
7.1%
a 293591
 
6.4%
o 271463
 
5.9%
n 247803
 
5.4%
i 234603
 
5.1%
s 229706
 
5.0%
r 217184
 
4.7%
h 212700
 
4.6%
Other values (123) 1299453
28.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3549288
77.2%
Space Separator 807675
 
17.6%
Other Punctuation 125119
 
2.7%
Uppercase Letter 85116
 
1.9%
Decimal Number 9873
 
0.2%
Dash Punctuation 7356
 
0.2%
Control 7290
 
0.2%
Open Punctuation 1681
 
< 0.1%
Close Punctuation 1681
 
< 0.1%
Initial Punctuation 135
 
< 0.1%
Other values (8) 310
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 457138
12.9%
t 324208
 
9.1%
a 293591
 
8.3%
o 271463
 
7.6%
n 247803
 
7.0%
i 234603
 
6.6%
s 229706
 
6.5%
r 217184
 
6.1%
h 212700
 
6.0%
l 149711
 
4.2%
Other values (38) 911181
25.7%
Uppercase Letter
ValueCountFrequency (%)
T 12969
15.2%
I 10272
12.1%
A 7247
 
8.5%
S 6221
 
7.3%
H 4951
 
5.8%
W 4381
 
5.1%
B 4341
 
5.1%
M 4250
 
5.0%
C 3668
 
4.3%
E 2816
 
3.3%
Other values (19) 24000
28.2%
Other Punctuation
ValueCountFrequency (%)
, 55380
44.3%
. 43132
34.5%
" 11634
 
9.3%
' 5571
 
4.5%
; 4121
 
3.3%
! 2214
 
1.8%
? 1699
 
1.4%
: 1083
 
0.9%
% 106
 
0.1%
/ 106
 
0.1%
Other values (6) 73
 
0.1%
Decimal Number
ValueCountFrequency (%)
0 2370
24.0%
1 2047
20.7%
2 1025
10.4%
9 746
 
7.6%
5 721
 
7.3%
8 700
 
7.1%
3 667
 
6.8%
4 560
 
5.7%
6 527
 
5.3%
7 510
 
5.2%
Math Symbol
ValueCountFrequency (%)
+ 17
37.8%
= 8
17.8%
~ 7
15.6%
× 7
15.6%
< 3
 
6.7%
± 2
 
4.4%
÷ 1
 
2.2%
Other Number
ValueCountFrequency (%)
½ 34
73.9%
¼ 6
 
13.0%
¹ 4
 
8.7%
¾ 2
 
4.3%
Dash Punctuation
ValueCountFrequency (%)
- 5689
77.3%
1495
 
20.3%
172
 
2.3%
Open Punctuation
ValueCountFrequency (%)
( 1671
99.4%
[ 10
 
0.6%
Close Punctuation
ValueCountFrequency (%)
) 1671
99.4%
] 10
 
0.6%
Initial Punctuation
ValueCountFrequency (%)
129
95.6%
6
 
4.4%
Currency Symbol
ValueCountFrequency (%)
$ 72
86.7%
£ 11
 
13.3%
Final Punctuation
ValueCountFrequency (%)
43
87.8%
6
 
12.2%
Space Separator
ValueCountFrequency (%)
807675
100.0%
Control
ValueCountFrequency (%)
7290
100.0%
Other Symbol
ValueCountFrequency (%)
° 76
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 9
100.0%
Modifier Symbol
ValueCountFrequency (%)
´ 1
100.0%
Format
ValueCountFrequency (%)
­ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3634401
79.1%
Common 961123
 
20.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 457138
12.6%
t 324208
 
8.9%
a 293591
 
8.1%
o 271463
 
7.5%
n 247803
 
6.8%
i 234603
 
6.5%
s 229706
 
6.3%
r 217184
 
6.0%
h 212700
 
5.9%
l 149711
 
4.1%
Other values (66) 996294
27.4%
Common
ValueCountFrequency (%)
807675
84.0%
, 55380
 
5.8%
. 43132
 
4.5%
" 11634
 
1.2%
7290
 
0.8%
- 5689
 
0.6%
' 5571
 
0.6%
; 4121
 
0.4%
0 2370
 
0.2%
! 2214
 
0.2%
Other values (47) 16047
 
1.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4593201
99.9%
Punctuation 1897
 
< 0.1%
None 426
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
807675
17.6%
e 457138
 
10.0%
t 324208
 
7.1%
a 293591
 
6.4%
o 271463
 
5.9%
n 247803
 
5.4%
i 234603
 
5.1%
s 229706
 
5.0%
r 217184
 
4.7%
h 212700
 
4.6%
Other values (78) 1297130
28.2%
Punctuation
ValueCountFrequency (%)
1495
78.8%
172
 
9.1%
129
 
6.8%
43
 
2.3%
31
 
1.6%
15
 
0.8%
6
 
0.3%
6
 
0.3%
None
ValueCountFrequency (%)
é 78
18.3%
° 76
17.8%
æ 37
 
8.7%
½ 34
 
8.0%
ö 22
 
5.2%
á 20
 
4.7%
Æ 14
 
3.3%
è 14
 
3.3%
œ 11
 
2.6%
£ 11
 
2.6%
Other values (27) 109
25.6%

Google WC
Real number (ℝ)

HIGH CORRELATION 

Distinct76
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean171.9602
Minimum125
Maximum205
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size37.0 KiB
2023-12-04T15:27:54.325185image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum125
5-th percentile143
Q1158
median174
Q3186
95-th percentile197
Maximum205
Range80
Interquartile range (IQR)28

Descriptive statistics

Standard deviation16.988921
Coefficient of variation (CV)0.098795656
Kurtosis-0.99875435
Mean171.9602
Median Absolute Deviation (MAD)14
Skewness-0.25087257
Sum812340
Variance288.62344
MonotonicityNot monotonic
2023-12-04T15:27:54.589788image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
179 107
 
2.3%
197 105
 
2.2%
189 103
 
2.2%
161 102
 
2.2%
177 101
 
2.1%
185 99
 
2.1%
191 99
 
2.1%
175 98
 
2.1%
192 97
 
2.1%
182 97
 
2.1%
Other values (66) 3716
78.7%
ValueCountFrequency (%)
125 1
 
< 0.1%
126 1
 
< 0.1%
129 1
 
< 0.1%
132 1
 
< 0.1%
133 1
 
< 0.1%
134 3
 
0.1%
135 5
 
0.1%
136 5
 
0.1%
137 12
0.3%
138 16
0.3%
ValueCountFrequency (%)
205 2
 
< 0.1%
203 3
 
0.1%
202 7
 
0.1%
201 11
 
0.2%
200 25
 
0.5%
199 52
1.1%
198 61
1.3%
197 105
2.2%
196 70
1.5%
195 75
1.6%

Joon WC v1
Real number (ℝ)

HIGH CORRELATION 

Distinct86
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean176.92549
Minimum135
Maximum220
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size37.0 KiB
2023-12-04T15:27:54.877706image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum135
5-th percentile146
Q1162
median178
Q3191
95-th percentile204
Maximum220
Range85
Interquartile range (IQR)29

Descriptive statistics

Standard deviation18.173592
Coefficient of variation (CV)0.1027189
Kurtosis-0.86274809
Mean176.92549
Median Absolute Deviation (MAD)14
Skewness-0.13168323
Sum835796
Variance330.27943
MonotonicityNot monotonic
2023-12-04T15:27:55.288162image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
193 100
 
2.1%
177 100
 
2.1%
190 97
 
2.1%
184 97
 
2.1%
198 97
 
2.1%
188 96
 
2.0%
189 96
 
2.0%
196 95
 
2.0%
183 93
 
2.0%
185 93
 
2.0%
Other values (76) 3760
79.6%
ValueCountFrequency (%)
135 3
 
0.1%
136 3
 
0.1%
137 1
 
< 0.1%
138 3
 
0.1%
139 1
 
< 0.1%
140 24
0.5%
141 24
0.5%
142 32
0.7%
143 34
0.7%
144 34
0.7%
ValueCountFrequency (%)
220 5
 
0.1%
219 5
 
0.1%
218 5
 
0.1%
217 6
0.1%
216 8
0.2%
215 8
0.2%
214 6
0.1%
213 14
0.3%
212 12
0.3%
211 13
0.3%

British WC
Real number (ℝ)

ZEROS 

Distinct8
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.12933954
Minimum0
Maximum9
Zeros4280
Zeros (%)90.6%
Negative0
Negative (%)0.0%
Memory size37.0 KiB
2023-12-04T15:27:55.554256image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum9
Range9
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.47104911
Coefficient of variation (CV)3.6419574
Kurtosis51.392443
Mean0.12933954
Median Absolute Deviation (MAD)0
Skewness5.6099155
Sum611
Variance0.22188726
MonotonicityNot monotonic
2023-12-04T15:27:55.774789image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
0 4280
90.6%
1 330
 
7.0%
2 78
 
1.7%
3 29
 
0.6%
5 3
 
0.1%
4 2
 
< 0.1%
9 1
 
< 0.1%
6 1
 
< 0.1%
ValueCountFrequency (%)
0 4280
90.6%
1 330
 
7.0%
2 78
 
1.7%
3 29
 
0.6%
4 2
 
< 0.1%
5 3
 
0.1%
6 1
 
< 0.1%
9 1
 
< 0.1%
ValueCountFrequency (%)
9 1
 
< 0.1%
6 1
 
< 0.1%
5 3
 
0.1%
4 2
 
< 0.1%
3 29
 
0.6%
2 78
 
1.7%
1 330
 
7.0%
0 4280
90.6%

British Words
Text

MISSING 

Distinct250
Distinct (%)56.3%
Missing4280
Missing (%)90.6%
Memory size37.0 KiB
2023-12-04T15:27:56.075243image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

Max length65
Median length35
Mean length11.31982
Min length3

Characters and Unicode

Total characters5026
Distinct characters27
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique173 ?
Unique (%)39.0%

Sample

1st rowtraveller
2nd rowsceptre
3rd rowgrey
4th rowaeroplane
5th rowaxe
ValueCountFrequency (%)
grey 34
 
5.6%
travelled 28
 
4.6%
colour 20
 
3.3%
metres 18
 
2.9%
centre 15
 
2.5%
axe 11
 
1.8%
travelling 11
 
1.8%
theatre 11
 
1.8%
mould 10
 
1.6%
kilometres 10
 
1.6%
Other values (186) 443
72.5%
2023-12-04T15:27:56.668749image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 644
12.8%
r 515
 
10.2%
l 410
 
8.2%
o 367
 
7.3%
a 324
 
6.4%
u 292
 
5.8%
s 273
 
5.4%
t 237
 
4.7%
i 234
 
4.7%
n 189
 
3.8%
Other values (17) 1541
30.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4692
93.4%
Other Punctuation 167
 
3.3%
Space Separator 167
 
3.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 644
13.7%
r 515
11.0%
l 410
 
8.7%
o 367
 
7.8%
a 324
 
6.9%
u 292
 
6.2%
s 273
 
5.8%
t 237
 
5.1%
i 234
 
5.0%
n 189
 
4.0%
Other values (15) 1207
25.7%
Other Punctuation
ValueCountFrequency (%)
, 167
100.0%
Space Separator
ValueCountFrequency (%)
167
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4692
93.4%
Common 334
 
6.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 644
13.7%
r 515
11.0%
l 410
 
8.7%
o 367
 
7.8%
a 324
 
6.9%
u 292
 
6.2%
s 273
 
5.8%
t 237
 
5.1%
i 234
 
5.0%
n 189
 
4.0%
Other values (15) 1207
25.7%
Common
ValueCountFrequency (%)
, 167
50.0%
167
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5026
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 644
12.8%
r 515
 
10.2%
l 410
 
8.2%
o 367
 
7.3%
a 324
 
6.4%
u 292
 
5.8%
s 273
 
5.4%
t 237
 
4.7%
i 234
 
4.7%
n 189
 
3.8%
Other values (17) 1541
30.7%

Sentence Count v1
Real number (ℝ)

HIGH CORRELATION 

Distinct38
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.5707028
Minimum2
Maximum41
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size37.0 KiB
2023-12-04T15:27:56.926661image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile5
Q17
median8
Q311
95-th percentile19
Maximum41
Range39
Interquartile range (IQR)4

Descriptive statistics

Standard deviation4.6401616
Coefficient of variation (CV)0.48482977
Kurtosis5.2743914
Mean9.5707028
Median Absolute Deviation (MAD)2
Skewness1.8963455
Sum45212
Variance21.5311
MonotonicityNot monotonic
2023-12-04T15:27:57.196286image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=38)
ValueCountFrequency (%)
8 676
14.3%
7 651
13.8%
6 563
11.9%
9 501
10.6%
10 405
8.6%
5 338
7.2%
11 301
 
6.4%
12 232
 
4.9%
13 155
 
3.3%
4 135
 
2.9%
Other values (28) 767
16.2%
ValueCountFrequency (%)
2 19
 
0.4%
3 56
 
1.2%
4 135
 
2.9%
5 338
7.2%
6 563
11.9%
7 651
13.8%
8 676
14.3%
9 501
10.6%
10 405
8.6%
11 301
6.4%
ValueCountFrequency (%)
41 1
 
< 0.1%
39 1
 
< 0.1%
38 2
 
< 0.1%
37 1
 
< 0.1%
36 1
 
< 0.1%
35 3
 
0.1%
33 2
 
< 0.1%
32 2
 
< 0.1%
31 8
0.2%
30 7
0.1%

Sentence Count v2
Real number (ℝ)

HIGH CORRELATION 

Distinct39
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.7523285
Minimum2
Maximum41
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size37.0 KiB
2023-12-04T15:27:57.557798image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile5
Q17
median9
Q311
95-th percentile19
Maximum41
Range39
Interquartile range (IQR)4

Descriptive statistics

Standard deviation4.6813387
Coefficient of variation (CV)0.48002266
Kurtosis5.2278219
Mean9.7523285
Median Absolute Deviation (MAD)2
Skewness1.8978978
Sum46070
Variance21.914932
MonotonicityNot monotonic
2023-12-04T15:27:57.809706image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=39)
ValueCountFrequency (%)
8 663
14.0%
7 646
13.7%
9 535
11.3%
6 517
10.9%
10 412
8.7%
11 325
6.9%
5 302
 
6.4%
12 244
 
5.2%
13 168
 
3.6%
14 142
 
3.0%
Other values (29) 770
16.3%
ValueCountFrequency (%)
2 12
 
0.3%
3 45
 
1.0%
4 135
 
2.9%
5 302
6.4%
6 517
10.9%
7 646
13.7%
8 663
14.0%
9 535
11.3%
10 412
8.7%
11 325
6.9%
ValueCountFrequency (%)
41 1
 
< 0.1%
40 1
 
< 0.1%
38 1
 
< 0.1%
37 1
 
< 0.1%
36 1
 
< 0.1%
35 2
 
< 0.1%
34 4
0.1%
33 4
0.1%
32 3
0.1%
31 5
0.1%

Paragraphs
Real number (ℝ)

Distinct18
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.542337
Minimum1
Maximum20
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size37.0 KiB
2023-12-04T15:27:58.006976image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q33
95-th percentile6
Maximum20
Range19
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.8662981
Coefficient of variation (CV)0.7340876
Kurtosis8.8809902
Mean2.542337
Median Absolute Deviation (MAD)1
Skewness2.2850584
Sum12010
Variance3.4830685
MonotonicityNot monotonic
2023-12-04T15:27:58.267225image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=18)
ValueCountFrequency (%)
1 1644
34.8%
2 1263
26.7%
3 812
17.2%
4 435
 
9.2%
5 253
 
5.4%
6 128
 
2.7%
7 81
 
1.7%
8 44
 
0.9%
10 19
 
0.4%
9 19
 
0.4%
Other values (8) 26
 
0.6%
ValueCountFrequency (%)
1 1644
34.8%
2 1263
26.7%
3 812
17.2%
4 435
 
9.2%
5 253
 
5.4%
6 128
 
2.7%
7 81
 
1.7%
8 44
 
0.9%
9 19
 
0.4%
10 19
 
0.4%
ValueCountFrequency (%)
20 1
 
< 0.1%
17 1
 
< 0.1%
16 2
 
< 0.1%
15 4
 
0.1%
14 5
 
0.1%
13 1
 
< 0.1%
12 7
 
0.1%
11 5
 
0.1%
10 19
0.4%
9 19
0.4%

BT Easiness
Real number (ℝ)

UNIQUE 

Distinct4724
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.95763863
Minimum-3.6762678
Maximum1.7113898
Zeros1
Zeros (%)< 0.1%
Negative3831
Negative (%)81.1%
Memory size37.0 KiB
2023-12-04T15:27:58.590203image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum-3.6762678
5-th percentile-2.7037429
Q1-1.6965546
median-0.90909418
Q3-0.20342801
95-th percentile0.68065368
Maximum1.7113898
Range5.3876576
Interquartile range (IQR)1.4931266

Descriptive statistics

Standard deviation1.0336564
Coefficient of variation (CV)-1.0793804
Kurtosis-0.48537058
Mean-0.95763863
Median Absolute Deviation (MAD)0.7480769
Skewness-0.13451472
Sum-4523.8849
Variance1.0684455
MonotonicityNot monotonic
2023-12-04T15:27:58.864088image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-0.340259125 1
 
< 0.1%
-2.273949732 1
 
< 0.1%
-3.431114154 1
 
< 0.1%
-0.755286774 1
 
< 0.1%
-0.803262973 1
 
< 0.1%
-0.519230627 1
 
< 0.1%
-2.141945369 1
 
< 0.1%
-0.081595481 1
 
< 0.1%
-0.357133172 1
 
< 0.1%
-0.437143526 1
 
< 0.1%
Other values (4714) 4714
99.8%
ValueCountFrequency (%)
-3.676267773 1
< 0.1%
-3.66836041 1
< 0.1%
-3.64289216 1
< 0.1%
-3.639935554 1
< 0.1%
-3.636833783 1
< 0.1%
-3.596750775 1
< 0.1%
-3.591318724 1
< 0.1%
-3.590328227 1
< 0.1%
-3.585369303 1
< 0.1%
-3.549190203 1
< 0.1%
ValueCountFrequency (%)
1.711389827 1
< 0.1%
1.658697523 1
< 0.1%
1.597869841 1
< 0.1%
1.583846826 1
< 0.1%
1.58010057 1
< 0.1%
1.546966393 1
< 0.1%
1.541671879 1
< 0.1%
1.467665465 1
< 0.1%
1.465592368 1
< 0.1%
1.465054812 1
< 0.1%

BT s.e.
Real number (ℝ)

UNIQUE 

Distinct4724
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.49121615
Minimum0
Maximum0.6496713
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size37.0 KiB
2023-12-04T15:27:59.130189image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.45083726
Q10.46866285
median0.48445233
Q30.50614283
95-th percentile0.55662515
Maximum0.6496713
Range0.6496713
Interquartile range (IQR)0.037479982

Descriptive statistics

Standard deviation0.033998652
Coefficient of variation (CV)0.069213221
Kurtosis11.758801
Mean0.49121615
Median Absolute Deviation (MAD)0.018108844
Skewness0.71895997
Sum2320.5051
Variance0.0011559083
MonotonicityNot monotonic
2023-12-04T15:27:59.536762image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.464009046 1
 
< 0.1%
0.514694232 1
 
< 0.1%
0.600151746 1
 
< 0.1%
0.484693766 1
 
< 0.1%
0.46455111 1
 
< 0.1%
0.479176332 1
 
< 0.1%
0.526451773 1
 
< 0.1%
0.507193133 1
 
< 0.1%
0.481407983 1
 
< 0.1%
0.462594537 1
 
< 0.1%
Other values (4714) 4714
99.8%
ValueCountFrequency (%)
0 1
< 0.1%
0.427220021 1
< 0.1%
0.428232657 1
< 0.1%
0.430425066 1
< 0.1%
0.43129656 1
< 0.1%
0.431815319 1
< 0.1%
0.433000257 1
< 0.1%
0.433103135 1
< 0.1%
0.43370786 1
< 0.1%
0.434138091 1
< 0.1%
ValueCountFrequency (%)
0.649671297 1
< 0.1%
0.649028675 1
< 0.1%
0.648732745 1
< 0.1%
0.648481117 1
< 0.1%
0.648473916 1
< 0.1%
0.648174341 1
< 0.1%
0.64783414 1
< 0.1%
0.646942357 1
< 0.1%
0.646906876 1
< 0.1%
0.646899678 1
< 0.1%

Kaggle split
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size37.0 KiB
Train
2834 
Test
1890 

Length

Max length5
Median length5
Mean length4.5999153
Min length4

Characters and Unicode

Total characters21730
Distinct characters8
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTrain
2nd rowTrain
3rd rowTrain
4th rowTest
5th rowTrain

Common Values

ValueCountFrequency (%)
Train 2834
60.0%
Test 1890
40.0%

Length

2023-12-04T15:27:59.819318image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-04T15:28:00.027988image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
train 2834
60.0%
test 1890
40.0%

Most occurring characters

ValueCountFrequency (%)
T 4724
21.7%
r 2834
13.0%
a 2834
13.0%
i 2834
13.0%
n 2834
13.0%
e 1890
8.7%
s 1890
8.7%
t 1890
8.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 17006
78.3%
Uppercase Letter 4724
 
21.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 2834
16.7%
a 2834
16.7%
i 2834
16.7%
n 2834
16.7%
e 1890
11.1%
s 1890
11.1%
t 1890
11.1%
Uppercase Letter
ValueCountFrequency (%)
T 4724
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 21730
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
T 4724
21.7%
r 2834
13.0%
a 2834
13.0%
i 2834
13.0%
n 2834
13.0%
e 1890
8.7%
s 1890
8.7%
t 1890
8.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 21730
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
T 4724
21.7%
r 2834
13.0%
a 2834
13.0%
i 2834
13.0%
n 2834
13.0%
e 1890
8.7%
s 1890
8.7%
t 1890
8.7%

Interactions

2023-12-04T15:27:44.718057image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:27.186174image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:29.256524image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:31.083954image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:32.870677image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:34.733773image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:37.016487image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:38.797983image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:40.748744image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:42.881920image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:44.938767image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:27.440872image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:29.510727image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:31.266309image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:33.055241image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:34.935954image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:37.211026image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:38.983198image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:40.966310image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:43.070797image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:45.130922image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:27.650413image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:29.706736image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:31.442626image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:33.226004image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:35.109352image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:37.391575image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:39.161838image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:41.208985image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:43.254992image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:45.335863image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:27.842371image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:29.887396image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:31.624522image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:33.406654image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:35.307229image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:37.576093image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:39.385016image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:41.430289image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:43.455578image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:45.498186image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:28.039004image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:30.049751image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:31.806818image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:33.565197image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:35.484053image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:37.737233image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:39.573176image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:41.617187image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:43.621677image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:45.660909image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:28.269264image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:30.218896image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:31.976646image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:33.772709image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:36.101481image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:37.888691image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:39.735563image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:41.803974image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:43.787586image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:45.842710image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:28.473270image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:30.412638image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:32.168055image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:33.985783image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:36.289930image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:38.078793image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:39.943619image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:42.018021image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:43.958000image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:46.022718image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:28.682677image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:30.585349image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:32.350485image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:34.181612image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:36.483939image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:38.269012image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:40.159665image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:42.265134image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:44.139896image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:46.222017image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:28.881123image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:30.755131image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:32.535278image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:34.386391image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:36.669671image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:38.448710image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:40.371479image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:42.504604image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:44.330863image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:46.383401image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:29.054542image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:30.918392image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:32.691904image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:34.551842image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:36.822138image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:38.615271image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:40.566908image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:42.686291image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-04T15:27:44.526594image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Correlations

2023-12-04T15:28:00.209996image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
BT EasinessBT s.e.British WCCategoryGoogle WCIDJoon WC v1Kaggle splitLocationMPAA MaxParagraphsPub YearSentence Count v1Sentence Count v2Source
BT Easiness1.000-0.048-0.1170.303-0.118-0.048-0.0310.0000.0650.0640.2480.1270.3760.3680.147
BT s.e.-0.0481.000-0.0100.0690.0130.0440.0090.0000.0190.000-0.013-0.0480.0140.0080.033
British WC-0.117-0.0101.0000.0390.0490.0160.0360.0000.0610.000-0.019-0.037-0.024-0.0200.039
Category0.3030.0690.0391.000-0.0380.2380.0670.0000.1480.0930.161-0.3920.1030.1190.635
Google\nWC-0.1180.0130.049-0.0381.000-0.0360.9530.0430.1070.0440.0210.0010.2460.2510.102
ID-0.0480.0440.0160.238-0.0361.000-0.0070.0830.2090.1950.062-0.492-0.133-0.1300.397
Joon\nWC v1-0.0310.0090.0360.0670.953-0.0071.0000.0000.0850.0310.167-0.0210.3600.3670.121
Kaggle split0.0000.0000.0000.0000.0430.0830.0001.0000.0250.029-0.0310.009-0.029-0.0340.113
Location0.0650.0190.0610.1480.1070.2090.0850.0251.0000.0000.1030.2300.0820.0800.276
MPAA\nMax0.0640.0000.0000.0930.0440.1950.0310.0290.0001.000-0.0230.092-0.011-0.0130.069
Paragraphs0.248-0.013-0.0190.1610.0210.0620.167-0.0310.103-0.0231.000-0.0140.3380.3340.143
Pub Year0.127-0.048-0.037-0.3920.001-0.492-0.0210.0090.2300.092-0.0141.0000.2800.2700.316
Sentence\nCount v10.3760.014-0.0240.1030.246-0.1330.360-0.0290.082-0.0110.3380.2801.0000.9760.280
Sentence\nCount v20.3680.008-0.0200.1190.251-0.1300.367-0.0340.080-0.0130.3340.2700.9761.0000.283
Source0.1470.0330.0390.6350.1020.3970.1210.1130.2760.0690.1430.3160.2800.2831.000

Missing values

2023-12-04T15:27:46.690166image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-04T15:27:47.282893image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-04T15:27:47.585248image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

IDAuthorTitleSourcePub YearCategoryLocationMPAA MaxExcerptGoogle WCJoon WC v1British WCBritish WordsSentence Count v1Sentence Count v2ParagraphsBT EasinessBT s.e.Kaggle split
0400Carolyn WellsPatty's Suitorsgutenberg1914.0LitmidGWhen the young people returned to the ballroom, it presented a decidedly changed appearance. Instead of an interior scene, it was a winter landscape.\nThe floor was covered with snow-white canvas, not laid on smoothly, but rumpled over bumps and hillocks, like a real snow field. The numerous palms and evergreens that had decorated the room, were powdered with flour and strewn with tufts of cotton, like snow. Also diamond dust had been lightly sprinkled on them, and glittering crystal icicles hung from the branches.\nAt each end of the room, on the wall, hung a beautiful bear-skin rug.\nThese rugs were for prizes, one for the girls and one for the boys. And this was the game.\nThe girls were gathered at one end of the room and the boys at the other, and one end was called the North Pole, and the other the South Pole. Each player was given a small flag which they were to plant on reaching the Pole.\nThis would have been an easy matter, but each traveller was obliged to wear snowshoes.1741791traveller11116-0.3402590.464009Train
1401Carolyn WellsTwo Little Women on a Holidaygutenberg1917.0LitmidPGAll through dinner time, Mrs. Fayre was somewhat silent, her eyes resting on Dolly with a wistful, uncertain expression. She wanted to give the child the pleasure she craved, but she had hard work to bring herself to the point of overcoming her own objections.\nAt last, however, when the meal was nearly over, she smiled at her little daughter, and said, "All right, Dolly, you may go."\n"Oh, mother!" Dolly cried, overwhelmed with sudden delight. "Really?\nOh, I am so glad! Are you sure you're willing?"\n"I've persuaded myself to be willing, against my will," returned Mrs. Fayre, whimsically. "I confess I just hate to have you go, but I can't bear to deprive you of the pleasure trip. And, as you say, it would also keep Dotty at home, and so, altogether, I think I shall have to give in."\n"Oh, you angel mother! You blessed lady! How good you are!" And Dolly flew around the table and gave her mother a hug that nearly suffocated her.1641840NaN15156-0.3153720.480805Train
2402Carolyn WellsPatty Blossomgutenberg1917.0LitmidPGAs Roger had predicted, the snow departed as quickly as it came, and two days after their sleigh ride there was scarcely a vestige of white on the ground. Tennis was again possible and a great game was in progress on the court at Pine Laurel. Patty and Roger were playing against Elise and Sam Blaney, and the pairs were well matched.\nBut the long-contested victory finally went against Patty, and she laughingly accepted defeat.\n"Only because Patty's not quite back on her game yet," Roger defended; "this child has been on the sick list, you know, Sam, and she isn't up to her own mark."\n"Well, I like that!" cried Patty; "suppose you bear half the blame, Roger. You see, Mr. Blaney, he is so absorbed in his own Love Game, he can't play with his old-time skill."\n"All right, Patsy, let it go at that. And it's so, too. I suddenly remembered something Mona told me to tell you, and it affected my service."1621800NaN11115-0.5801180.476676Train
3403CHARLES KINGSLEYTHE WATER-BABIES\nA Fairy Tale for a Land-Babygutenberg1863.0LitmidPG-13Mr. Grimes was to come up next morning to Sir John Harthover's, at the Place, for his old chimney-sweep was gone to prison, and the chimneys wanted sweeping. And so he rode away, not giving Tom time to ask what the sweep had gone to prison for, which was a matter of interest to Tom, as he had been in prison once or twice himself. Moreover, the groom looked so very neat and clean, with his drab gaiters, drab breeches, drab jacket, snow-white tie with a smart pin in it, and clean round ruddy face, that Tom was offended and disgusted at his appearance, and considered him a stuck-up fellow, who gave himself airs because he wore smart clothes, and other people paid for them; and went behind the wall to fetch the half-brick after all; but did not, remembering that he had come in the way of business, and was, as it were, under a flag of truce.1591600NaN331-1.7859650.526599Test
4404Charles KingsleyHOW THE ARGONAUTS WERE DRIVEN INTO THE UNKNOWN SEAgutenberg1889.0LitmidPGAnd outside before the palace a great garden was walled round, filled full of stately fruit-trees, gray olives and sweet figs, and pomegranates, pears, and apples, which bore the whole year round. For the rich south-west wind fed them, till pear grew ripe on pear, fig on fig, and grape on grape, all the winter and the spring. And at the farther end gay flower-beds bloomed through all seasons of the year; and two fair fountains rose, and ran, one through the garden grounds, and one beneath the palace gate, to water all the town. Such noble gifts the heavens had given to Alcinous the wise.\nSo they went in, and saw him sitting, like Poseidon, on his throne, with his golden sceptre by him, in garments stiff with gold, and in his hand a sculptured goblet, as he pledged the merchant kings; and beside him stood Arete, his wise and lovely queen, and leaned against a pillar as she spun her golden threads.1631641sceptre552-1.0540130.450007Train
5405Charles Madison Curry\n Erle Elsworth ClippingerThe Three Little Bearsgutenberg1920.0LitmidGOnce upon a time there were Three Bears who lived together in a house of their own in a wood. One of them was a Little, Small, Wee Bear; and one was a Middle-sized Bear, and the other was a Great, Huge Bear. They had each a pot for their porridge; a little pot for the Little, Small, Wee Bear; and a middle-sized pot for the Middle Bear; and a great pot for the Great, Huge Bear. And they had each a chair to sit in; a little chair for the Little, Small, Wee Bear; and a middle-sized chair for the Middle Bear; and a great chair for the Great, Huge Bear. And they had each a bed to sleep in; a little bed for the Little, Small, Wee Bear; and a middle-sized bed for the Middle Bear; and a great bed for the Great, Huge Bear.1471470NaN5510.2471970.510845Train
6406Clair W. HayesThe Boy Allies On the Firing Line\n Or, Twelve Days Battle Along the Marnegutenberg1915.0LitmidPGHal and Chester found ample time to take an inventory of the general's car. It was a huge machine, and besides being fitted up luxuriously was also furnished as an office, that the general might still be at work while he hurried from one part of the field to another when events demanded his immediate presence. Even now, with treachery threatening, and whirling along at a terrific speed, General Joffre, probably because of habit, fell to work sorting papers, studying maps and other drawings.\nFor almost two hours the car whirled along at top speed, and at length pulled up in the rear of an immense body of troops, who, even to Hal and Chester, could be seen preparing for an advance. General Joffre was out of the car before it came to a full stop, and Hal and Chester were at his heels. An orderly approached.\n"My respects to General Tromp, and tell him I desire his presence immediately," ordered General Joffre.1611660NaN773-0.8618090.480936Train
7407Clair W. HayesThe Boy Allies in Great Perilgutenberg1916.0LitmidPG-13Hal Paine and Chester Crawford were typical American boys. With the former's mother, they had been in Berlin when the great European conflagration broke out and had been stranded there. Mrs. Paine had been able to get out of the country, but Hal and Chester were left behind.\nIn company with Major Raoul Derevaux, a Frenchman, and Captain Harry Anderson, an Englishman, they finally made their way into Belgium, where they arrived in time to take part in the heroic defense of Liége in the early stages of the war. Here they rendered such invaluable service to the Belgian commander that they were commissioned lieutenants in the little army of King Albert.\nBoth in fighting and in scouting they had proven their worth. Following the first Belgian campaign, the two lads had seen service with the British troops on the continent, where they were attached to the staff of General Sir John French, in command of the English forces. Also they had won the respect and admiration of General Joffre, the French commander-in-chief.1711740NaN883-1.7590610.476507Train
8408Clair W. HayesThe Boy Allies At Verdungutenberg1917.0LitstartPGOn the twenty-second of February, 1916, an automobile sped northward along the French battle line that for almost two years had held back the armies of the German emperor, strive as they would to win their way farther into the heart of France. For months the opposing forces had battled to a draw from the North Sea to the boundary of Switzerland, until now, as the day waned—it was almost six o'clock—the hands of time drew closer and closer to the hour that was to mark the opening of the most bitter and destructive battle of the war, up to this time.\nIt was the eve of the battle of Verdun.\nThe occupants of the automobile as it sped northward numbered three. In the front seat, alone at the driver's wheel, a young man bent low. He was garbed in the uniform of a British lieutenant of cavalry. Close inspection would have revealed the fact that the young man was a youth of some eighteen years, fair and good to look upon.1701730NaN783-0.9523250.498116Train
9409Claude A. LabelleThe Ranger Boys and the Border Smugglersgutenberg1922.0LitmidPGThe boys left the capitol and made their way down the long hill to the main business part of the town. As they struck onto the main business street, Garry noticed the familiar blue bell sign of the telephone company.\n"Say, boys, I have an idea. Let's stop in here and put in long distance calls and say hello to our folks. How does the idea strike you?" said Garry, almost in one breath.\n"Ripping," shouted Phil, while Dick didn't wait to make any remark, but dived in through the door, and in a trice was putting in his call. Phil followed suit, while Garry waited, as he would talk when Dick had finished.\nThis pleasant duty done, they went to a restaurant for dinner. Here they attracted no little attention, for their khaki clothes looked almost like uniforms. Added to this was the fact that they wore forest shoepacks, those high laced moccasins with an extra leather sole, and felt campaign hats.1601690NaN11104-0.3716410.463710Train
IDAuthorTitleSourcePub YearCategoryLocationMPAA MaxExcerptGoogle WCJoon WC v1British WCBritish WordsSentence Count v1Sentence Count v2ParagraphsBT EasinessBT s.e.Kaggle split
47148022original text by Steve Whitt\nadapted by Jessica Fries-GaitherA Tundra Talestatic.ehe.osu.edu2008.0InfostartGNear the top of the world is land called tundra. The tundra is flat and has no trees. It is covered by snow and ice most of the year. \nIn the spring, the snow and ice melt. Beneath the ground, the soil stays frozen. The ground gets very soggy. It is a marsh.\nSmall yellow flowers grow from the cold, wet ground. They are called marsh marigolds.\nFlies hide in the flowers. They soak up the Sun’s energy and get warm.\nThe flies fly from flower to flower. They help the flowers make seeds.\nCaribou eat the flowers. The caribou also give the plants the nutrients they need to grow.\nMother flies lay their eggs inside the caribou’s nose. It is warm there. The young flies eat and grow.\nThe young flies get bigger. AH-CHOO! The caribou sneezes. The flies land on the ground. Soon, they will be adults.\nThese plants and animals need each other. Can you think of others?1541610NaN252590.6081080.505921Train
47158023Stephen WhittDinosaurs in the Darkbeyondpenguins2008.0InfostartGWhen you think of dinosaurs and where they lived, what do you picture? Do you see hot, steamy swamps, thick jungles, or sunny plains? Dinosaurs lived in those places, yes. But did you know that some dinosaurs lived in the cold and the darkness near the North and South Poles?\nThis surprised scientists, too. Paleontologists used to believe that dinosaurs lived only in the warmest parts of the world. They thought that dinosaurs could only have lived in places where turtles, crocodiles, and snakes live today. Later, these dinosaur scientists began finding bones in surprising places.\nOne of those surprising fossil beds is a place called Dinosaur Cove, Australia. One hundred million years ago, Australia was connected to Antarctica. Both continents were located near the South Pole. Today, paleontologists dig dinosaur fossils out of the ground. They think about what those ancient bones must mean.1431450NaN131331.7113900.646900Train
47168024wikijuniorIntroduction to The Elementswikibooks2013.0InfostartGThe whole universe is built of matter. Right now, you are surrounded by it. The air we breathe is matter, and all the things you see around you are matter. The odors you smell are matter and the sounds you hear are caused by the movement of matter in your ears.\nMatter is everything that takes up space and has weight. Scientists say that matter has volume and mass. Matter is made up of tiny building blocks called atoms. The purest type of atom is called an element. The elements are what give matter its different qualities.\nToday we can see atoms by using a special instrument called an electron microscope. An electron microscope lets us see things that are millions of times smaller than the things we can see with a powerful optical microscope.\nMost of the matter around us has more than one element in it. But some matter is made up of just one element. If you have ever held a diamond, for example, it is made of just one element, Carbon.1721750NaN141440.6508290.544809Test
47178025wikijuniorSolid Basicswikibooks2019.0InfostartGSo what is a solid? Solids are usually hard because their molecules have been packed together. The closer your molecules are, the harder you are. Solids also can hold their own shape. A rock will always look like a rock unless something happens to it. The same goes for a diamond. Even when you grind up a solid into a powder, you will see tiny little pieces of that solid under a microscope. Liquids will move and fill up any container. Solids keep their shape.\nIn the same way that a solid holds its shape, the atoms inside of a solid are not allowed to move around too much. This is one of the physical characteristics of solids. Atoms and molecules in liquids and gases are bouncing and floating around, free to move where they want. The molecules in a solid are stuck in place. The atoms still spin and the electrons will still fly around, but the entire atom will not change position.1631640NaN141420.1894760.535648Train
47188026wikijuniorLiquid Basicswikibooks2020.0InfostartGThe second state of matter we will discuss is a liquid. Solids are hard things you can hold. Gases are floating around you and in bubbles. What is a liquid? Water is a liquid. Your blood is a liquid. Liquids are an in-between state of matter. They can be found in between the solid and gas states. They don't have to be made up of the same compounds. If you have a variety of materials in a liquid, it is called a solution.\nOne characteristic of a liquid is that it will fill up the shape of a container. If you pour some water in a cup, it will fill up the bottom of the cup first and then fill the rest. The water will also take the shape of the cup. It fills the bottom first because of gravity. The top part of a liquid will usually have a flat surface. That flat surface is because of gravity too. Putting an ice cube (solid) into a cup will leave you with a cube in the middle of the cup; the shape won't change until the ice becomes a liquid.1891900NaN171720.2552090.483866Train
47198027wikijuniorBugs/Monarch butterflywikibooks2019.0InfostartGThe name Monarch means “king”. An adult Monarch Butterfly is about 1 ½ inches long. Its body is black with white markings. There are white spots on the head and around the wing edges. The wings are bright orange with black veins. The undersides of the wings are light orange. Male Monarchs have a black spot on the back of each hind wing.\nWings have 2 parts: a forewing and a hind wing. The wing span can be up to 4 inches across. The back edges of the wings are called “margins”. They bend to push air backward and move the butterfly forward. The stiff front edges of the wings lift the butterfly in flight. Black veins create a framework that keeps the wings stable. Female wing veins are thicker than those of males.\nMonarch Butterflies come from yellow, black, and white striped caterpillars. Monarch caterpillars grow to about 2 inches in length. They have 2 tentacles that look like antennae at the front of the body, and 2 tentacles at the back.1711730NaN171730.4233880.511439Test
47208028wikijuniorBugs/Walking Stickwikibooks2020.0InfostartGWalking Sticks are long, thin, and slow-moving bugs, that looks like a stick, twig or branch. They are also called walking sticks. Males tend to be smaller than females. The colors are usually brown or green, but may be grey or shades of red. Also some are shaded orange, but in little places. Stripes, spots, and speckles are more common than solid. Males usually have wings, but females are most likely wingless. Short, tough forewings protect the larger fan-shaped hind wings.\nThe common American Walking Stick is slender and shiny with long antennas. The adult male is 2 to 3 inches long with bands of color,while the adult female is 4 to 5 inches long.\nThe New Guinea Spiny Stick Insect is big and bulky. It can grow to 4-1/2 inches to 6 inches long. It resembles a branch more than a slender stick. The colors are dark brown to black. Their legs are thick and prickly. Adult males have a long thorn on each hind leg. Nymphs, another type new type of walking stick, have green-and-brown patterns.1761780NaN17173-0.6141420.475506Test
47218029wikijuniorBugs/Black Widowwikibooks2020.0InfostartGA Black Widow is a shiny black spider. It has an orange or red mark that looks like an hourglass. Its abdomen is shaped like a sphere and has an hourglass mark on the bottom. Often there are just two red marks separated by black. Females sometimes have the hourglass shape on top of the abdomen above the silk-spinning organs (spinnerets). Females are usually about 1-1/2 inches long including their leg span. In areas where grapes grow, females are very small and round. They resemble shiny black or red grapes.\nMale Black Widows are much smaller than females. Their bodies are only about 1/4 inch long. They can be either gray or black. They do not have an hourglass mark, but may have red spots on the abdomen.\nBlack widows are sometimes called “comb-footed” spiders. The bristles on their hind legs are used to cover trapped prey with silk.\nYoung spiders are called “spiderlings”. They shed their outer covering (exoskeleton) as they grow. Spiderlings are orange, brown, or white at first and get darker each time they shed their skin (molt).1781810NaN171740.3103360.508939Test
47228030wikijuniorSolidswikibooks2014.0InfostartGSolids are shapes that you can actually touch. They have three dimensions, which means that the have length, width and height. These shapes are what make up our daily life, and are very useful. Points on a solid must not be coplanar or colinear. The edge of solids are called the edge, and the surfaces are called faces. The corners, like angles and plane figures, are called vertices.\nA solid with only straight edges is called a polyhedron(pol-ee-HEE-dron). The plural form of polehedron is polyhedra(pol-ee-HEE-drah). Your chocolate bars are polyhedra, The Great Pyramids are polyhedra – a lot of things are. We will go into detail about them later.\nWhen dealing with these solid figures, there are two measurements we will need to know: the total surface area and the volume. The former is the sum of the faces of the solid; the latter is how big the solid is.1481500NaN12123-0.2152790.514128Train
47238031wikijuniorAnialswikibooks2018.0InfostartGAnimals are made of many cells. They eat things and digest them inside. Most animals can move. Only animals have brains (though not even all animals do; jellyfish, for example, do not have brains).\nAnimals are found all over the earth. They dig in the ground, swim in the oceans, and fly in the sky.\nHumans are a type of animal. So are dogs, cats, cows, horses, frogs, fish, and so on and on.\nAnimals can be divided into two main groups, vertebrates and invertebrates. Vertebrates can be further divided into mammals, fish, birds, reptiles, and amphibians. Invertebrates can be divided into arthropods (like insects, spiders, and crabs), mollusks, sponges, several different kinds of worms, jellyfish — and quite a few other subgroups. There are at least thirty kinds of invertebrates, compared to the five kinds of vertebrates. Vertebrates have a backbone, while invertebrates do not.1431460NaN131340.3007790.512379Train